Project - Uber Data Analysis¶

Using Python For:¶


Check how long do people travel with Uber?

What Hour Do Most People Take Uber To Their Destination?

Check The Purpose Of Trips

Which Day Has The Highest Number Of Trips

What Are The Number Of Trips Per Each Day?

What Are The Trips In The Month

The starting points of trips. Where Do People Start Boarding Their Trip From Most?


Importing Libraries¶

In [1]:
import pandas as pd
import numpy as np
import datetime
#import matplotlib
#import matplotlib.pyplot as plt
#import seaborn as sns
import calendar
#from IPython.display import HTML
import plotly.offline as pyo
pyo.init_notebook_mode()
import plotly.express as px
%matplotlib inline
In [2]:
data=pd.read_csv('UberDrives.csv')
data
Out[2]:
START_DATE* END_DATE* CATEGORY* START* STOP* MILES* PURPOSE*
0 1/1/2016 21:11 1/1/2016 21:17 Business Fort Pierce Fort Pierce 5.1 Meal/Entertain
1 1/2/2016 1:25 1/2/2016 1:37 Business Fort Pierce Fort Pierce 5.0 NaN
2 1/2/2016 20:25 1/2/2016 20:38 Business Fort Pierce Fort Pierce 4.8 Errand/Supplies
3 1/5/2016 17:31 1/5/2016 17:45 Business Fort Pierce Fort Pierce 4.7 Meeting
4 1/6/2016 14:42 1/6/2016 15:49 Business Fort Pierce West Palm Beach 63.7 Customer Visit
... ... ... ... ... ... ... ...
1151 12/31/2016 13:24 12/31/2016 13:42 Business Kar?chi Unknown Location 3.9 Temporary Site
1152 12/31/2016 15:03 12/31/2016 15:38 Business Unknown Location Unknown Location 16.2 Meeting
1153 12/31/2016 21:32 12/31/2016 21:50 Business Katunayake Gampaha 6.4 Temporary Site
1154 12/31/2016 22:08 12/31/2016 23:51 Business Gampaha Ilukwatta 48.2 Temporary Site
1155 Totals NaN NaN NaN NaN 12204.7 NaN

1156 rows × 7 columns

Data Cleaning¶

In [3]:
data.columns='START_DATE END_DATE CATEGORY START STOP MILES PURPOSE'.split(' ')
data
Out[3]:
START_DATE END_DATE CATEGORY START STOP MILES PURPOSE
0 1/1/2016 21:11 1/1/2016 21:17 Business Fort Pierce Fort Pierce 5.1 Meal/Entertain
1 1/2/2016 1:25 1/2/2016 1:37 Business Fort Pierce Fort Pierce 5.0 NaN
2 1/2/2016 20:25 1/2/2016 20:38 Business Fort Pierce Fort Pierce 4.8 Errand/Supplies
3 1/5/2016 17:31 1/5/2016 17:45 Business Fort Pierce Fort Pierce 4.7 Meeting
4 1/6/2016 14:42 1/6/2016 15:49 Business Fort Pierce West Palm Beach 63.7 Customer Visit
... ... ... ... ... ... ... ...
1151 12/31/2016 13:24 12/31/2016 13:42 Business Kar?chi Unknown Location 3.9 Temporary Site
1152 12/31/2016 15:03 12/31/2016 15:38 Business Unknown Location Unknown Location 16.2 Meeting
1153 12/31/2016 21:32 12/31/2016 21:50 Business Katunayake Gampaha 6.4 Temporary Site
1154 12/31/2016 22:08 12/31/2016 23:51 Business Gampaha Ilukwatta 48.2 Temporary Site
1155 Totals NaN NaN NaN NaN 12204.7 NaN

1156 rows × 7 columns

In [4]:
data.isna().sum()
Out[4]:
START_DATE      0
END_DATE        1
CATEGORY        1
START           1
STOP            1
MILES           0
PURPOSE       503
dtype: int64
In [5]:
data.PURPOSE.fillna('None',inplace=True)
data.dropna(inplace=True)
data
Out[5]:
START_DATE END_DATE CATEGORY START STOP MILES PURPOSE
0 1/1/2016 21:11 1/1/2016 21:17 Business Fort Pierce Fort Pierce 5.1 Meal/Entertain
1 1/2/2016 1:25 1/2/2016 1:37 Business Fort Pierce Fort Pierce 5.0 None
2 1/2/2016 20:25 1/2/2016 20:38 Business Fort Pierce Fort Pierce 4.8 Errand/Supplies
3 1/5/2016 17:31 1/5/2016 17:45 Business Fort Pierce Fort Pierce 4.7 Meeting
4 1/6/2016 14:42 1/6/2016 15:49 Business Fort Pierce West Palm Beach 63.7 Customer Visit
... ... ... ... ... ... ... ...
1150 12/31/2016 1:07 12/31/2016 1:14 Business Kar?chi Kar?chi 0.7 Meeting
1151 12/31/2016 13:24 12/31/2016 13:42 Business Kar?chi Unknown Location 3.9 Temporary Site
1152 12/31/2016 15:03 12/31/2016 15:38 Business Unknown Location Unknown Location 16.2 Meeting
1153 12/31/2016 21:32 12/31/2016 21:50 Business Katunayake Gampaha 6.4 Temporary Site
1154 12/31/2016 22:08 12/31/2016 23:51 Business Gampaha Ilukwatta 48.2 Temporary Site

1155 rows × 7 columns

In [6]:
data.START_DATE = pd.to_datetime(data.START_DATE, format="%m/%d/%Y %H:%M")
data.END_DATE = pd.to_datetime(data.END_DATE, format="%m/%d/%Y %H:%M")
data['DAYS']=[calendar.day_name[i] for i in data.START_DATE.dt.dayofweek.to_list()]
data.head()
Out[6]:
START_DATE END_DATE CATEGORY START STOP MILES PURPOSE DAYS
0 2016-01-01 21:11:00 2016-01-01 21:17:00 Business Fort Pierce Fort Pierce 5.1 Meal/Entertain Friday
1 2016-01-02 01:25:00 2016-01-02 01:37:00 Business Fort Pierce Fort Pierce 5.0 None Saturday
2 2016-01-02 20:25:00 2016-01-02 20:38:00 Business Fort Pierce Fort Pierce 4.8 Errand/Supplies Saturday
3 2016-01-05 17:31:00 2016-01-05 17:45:00 Business Fort Pierce Fort Pierce 4.7 Meeting Tuesday
4 2016-01-06 14:42:00 2016-01-06 15:49:00 Business Fort Pierce West Palm Beach 63.7 Customer Visit Wednesday

Different Categories¶

In [7]:
category=pd.DataFrame(dict(data.CATEGORY.value_counts()).items(),columns=['Category','No. Of Rides'])
fig=px.bar(category,
            x=category.Category,
            y=category['No. Of Rides'],
            title='Different Categories Of Drives',
            text=category['No. Of Rides'],
            height=500)
fig.show()

How long do people travel with Uber¶

In [8]:
fig=px.histogram(data,x=data.MILES,
                 text_auto='MILES',
                 title='Distance Travels On Uber',
                 height=600)
fig.update_traces(textposition='outside')
fig.show()

Hour Do Most People Take Uber To Their Destination¶

In [9]:
hour=pd.DataFrame(dict(data.START_DATE.dt.hour.value_counts()).items(),columns='Hours Frequency'.split(' '))
hour.Hours=hour.Hours.astype(str)+' hrs'
fig=px.bar(hour,
           x=hour.Hours,
           y=hour.Frequency,
           title='Number of Trips Vs Hours',
           text=hour.Frequency,
           height=500)
fig.update_traces(textposition='outside')
fig.show()

Check The Purpose Of Trips¶

In [10]:
purpose=pd.DataFrame(dict(data.PURPOSE.value_counts()).items(),columns='Purpose Frequency'.split()).drop(0,axis=0)
fig=px.bar(purpose,
           x=purpose.Purpose,
           y=purpose.Frequency,
           title='Number of Trips Vs Purpose',
           text=purpose.Frequency,
           height=500)
fig.update_traces(textposition='outside')
fig.show()

Day Has The Highest Number Of Trips¶

In [11]:
days=pd.DataFrame(dict(data.DAYS.value_counts()).items(),columns='Days Frequency'.split())
fig=px.bar(days,
           x=days.Days,
           y=days.Frequency,
           title='Number of Trips Vs Days',
           text=days.Frequency,
           height=500)
fig.update_traces(textposition='outside')
fig.show()

The Number Of Trips Per Each Day¶

In [12]:
dates=pd.DataFrame(dict(data.START_DATE.dt.day.value_counts()).items(),columns='Dates Frequency'.split(' '))
dates.sort_values(inplace=True,by='Dates')
dates.Dates=dates.Dates.astype(str)
fig=px.bar(dates,
           x=dates.Dates,
           y=dates.Frequency,
           title='Number of Trips Vs Dates',
           text=dates.Frequency,
           height=500)
fig.update_traces(textposition='outside')
fig.show()

The Trips In The Month¶

In [13]:
months=pd.DataFrame(dict(data.START_DATE.dt.month.value_counts()).items(),columns='Months Trips'.split())
months.sort_values(by='Months',inplace=True)
months.Months=months.Months.astype(str)
fig=px.bar(months,
           x=months.Months,
           y=months.Trips,
           title='The Trips In The Month',
           text=months.Trips,
           height=500)
fig.update_traces(textposition='outside')
fig.show()

The Starting Points Of Trips¶

In [14]:
starting=pd.DataFrame(dict(data.START.value_counts().nlargest(10,keep='all')).items(),columns=['Starting Location','Trips'])
fig=px.bar(starting,
           x=starting['Starting Location'],
           y=starting.Trips,
           title='The Starting Points Of Trips',
           text=starting.Trips,
           height=500)
fig.update_traces(textposition='outside')
fig.show()
In [15]:
!jupyter nbconvert --to html Uber_Analysis.ipynb
[NbConvertApp] Converting notebook Uber_Analysis.ipynb to html
[NbConvertApp] Writing 4387838 bytes to Uber_Analysis.html